Reading Files¶

example.txt¶

In [ ]:
%%file example.txt
This is an example file.
It has several lines of text.
Nothing terribly interesting.

🎨 open¶

In [ ]:
%%file read_example.py
file = open('example.txt')

for line in file:
    print(line)

file.close()

read_example.py¶

NOTES

  • open
  • Whatever we got back from open is iterable
    • Each element is a line from the file
  • We need to close the file when we are done.
When a thing can be used with for, that thing is iterable

NOTES

  • What things do we know of so far that are iterable?

🖌 with¶

In [ ]:
with open('example.txt') as file:
    for line in file:
        print(line)

NOTES

  • What is different?
  • with open(...) as file
    • file is the variable name - you can pick whatever you want
    • with closes the file at the end of the block (so you don't have to)
  • We almost always use with when working with files
    • Occassionally a scenario comes up where using with isn't what you need, but they are rare

🧑🏽‍🎨 List of lines¶

In [ ]:
def get_lines(filename):
    with open(filename) as file:
        lines = []
        for line in file:
            lines.append(line)
        return lines
In [ ]:
print(get_lines('example.txt'))

What is the \n at the end of each line?

🎨 '\n'¶

We need a way to represent all characters in our code, including the ones you can't see (like space and newline).

In Python, this is done using the backslash character \. We sometimes call this the escape character.

When you see a \ in a string, it means the character after it is special.

\n does not mean "n", it means "newline".

In [ ]:
print('I have a dog. Her name is Sally.')
In [ ]:
print('I have a dog.\nHer name is Sally.')

Word to the wise

  • When you read the lines of a file, the newline character at the end of each line is preserved.

🖌 list(...)¶

In [ ]:
def get_lines(filename):
    with open(filename) as file:
        lines = []
        for line in file:
            lines.append(line)
        return lines
In [ ]:
def get_lines(filename):
    with open(filename) as file:
        return list(file)
In [ ]:
print(get_lines('example.txt'))

NOTES

  • Anything you can pass to for...in... (i.e. an iterable) can be turned into a list using list(...)
  • Which version of get_lines do you like more? Why?

🎨 .split()¶

In [ ]:
"this is a string containing several words\n".split()

NOTES

  • What is happening here?
  • A string is being turned into a list of words.
In [ ]:
'1 2 3 4 5'.split()

How is

['1', '2', '3', '4', '5']

different from

[1, 2, 3, 4, 5]

NOTES

  • A string '1' is different from 1
  • Discuss the difference between values and string representations of a value
  • What if you want to convert '1' to 1?

🖌 int(...)¶

In [ ]:
'7'
In [ ]:
int('7')
In [ ]:
int('12') * 2

🖌 float(...)¶

In [ ]:
'7.0'
In [ ]:
float('7.0')
In [ ]:
float('7.0') * 2
In [ ]:
int('7.0')
In [ ]:
float('7')

How do you know whether to use int or float?¶

Look at the file!

The process of extracing values from a string is called parsing

👩🏼‍🎨 print_means.py¶

In [ ]:
# Solution

def get_lines(filename):
    """Returns a list of the lines in the file."""
    with open(filename) as file:
        return list(file)
    
def get_numbers(line):
    """Parses out the integers in `line` into a list.
    
    >>> get_numbers("1 2 3")
    [1, 2, 3]
    """
    tokens = line.split()
    numbers = []
    for token in tokens:
        numbers.append(int(token))
    return numbers

def average(numbers):
    """Computes the average of a list of numbers
    
    >>> average([1, 2, 3])
    2
    """
    total = 0
    for number in numbers:
        total = total + number
    return total / len(numbers)

def print_means(filename):
    """Prints the average value for each line in the file."""
    for line in get_lines(filename):
        numbers = get_numbers(line)
        ave = average(numbers)
        print(ave)

filename = 'data.txt'
print_means(filename)

NOTES

  • Invite the students to discuss with a partner
    • What is the goal?
    • What are the steps?
    • Draw it out together.
  • Draw out strategy on board. What is the goal? What are the steps?
    • use pseudocode
    • use dataflow diagrams, with arrows from one idea to the next
    • look at how the data type changes as it goes from step to step
  • Stub out code with comments, then function stubs.
  • Implement individual functions. Test them as we go with pytests.
    • Explain (again?) that pytests don't work well with files, so try to make the functions that work with file simple.
  • Step through with a debugger as needed.

Key Ideas¶

  • open
  • with ... as ...
  • \n
  • list(...)
  • .split()
  • int(...) / float(...)

Appendix¶

🎨 .strip()¶

Sometimes you don't want newlines at the end of your strings.

In [ ]:
def get_lines(filename):
    with open(filename) as file:
        lines = []
        for line in file:
            lines.append(line.strip())
        return lines
In [ ]:
get_lines('example.txt')